Learning to Reconstruct 3D Human Pose and Motion from Silhouettes
نویسندگان
چکیده
We will describe our ongoing work on learning-based methods for recovering 3D human body pose and motion from single images and from monocular image sequences. The methods work directly with raw image observations and require neither an explicit 3D body model nor a prior labelling of body parts in the image. Instead, they recover the body pose or motion by direct nonlinear regression against shape descriptors extracted automatically from image silhouettes or contours. For improved resistance to segmentation errors and occlusions, we use a robust shape representation: histograms of locally-supported shape-contexts descriptors. The image description is thus related to (generalized / robustified versions of) Brand’s ‘Shadow Puppetry’ [1], Mori & Malik’s shape context method [2] and Shakhnarovich et al’s contour based method [3]. We regress the current pose (body joint angles) against both silhouette shape and (in the tracking-based schemes) the previous 1–2 poses. The regression is nontrivial owing to high dimensionality, sparse training data, and the fact that recovering pose from monocular image observations is inherently multi-valued owing to pervasive kinematic ambiguities. For tracking, we evaluated several different regression dependency structures designed to reduce these reconstruction ambiguities while capturing the dynamics, observations and the correcting effect of observation updates (all nonlinear and unknown a priori). We tested a number of different regression methods on these problems, including regularized least squares, Support Vector Regression and Relevance Vector Machine (RVM) regression [4], over both linear and kernel bases. In general the kernelized methods do best and the RVM framework provides much sparser regressors without compromising performance. But linear least squares (over our very nonlinear shape description) also performs surprisingly well. If time permits, we will also sketch our novel scalable continuation-based RVM training algorithm. The methods are trained using real human motion capture data, to ensure that they capture both the global structure and the fine details of human motion. However to improve model coverage and make the most of the limited amount of training data available, we currently re-synthesize the corresponding training images from a range of different viewpoints. We have tested our models both quantitatively on independently captured test sequences and qualitatively on videos of typical human motions. On the test sequences, we are currently getting mean angular errors of about 6–7 degrees — a factor of about 3 better than the current state of the art for the much simpler upper-body-only problem.
منابع مشابه
Learning to Reconstruct 3D Human Motion from Bayesian Mixtures of Experts. A Probabilistic Discriminative Approach
We describe a mixture density propagation algorithm to estimate 3D human motion in monocular video sequences, based on observations encoding the appearance of image silhouettes. Our approach is discriminative rather than generative, therefore it does not require the probabilistic inversion of a predictive observation model. Instead, it uses a large human motion capture data-base and a 3D comput...
متن کامل3D Human Pose Estimation from Monocular Image Sequences
Automatic 3D reconstruction of human poses from monocular images is a challenging and popular topic in the computer vision community, which provides a wide range of applications in multiple areas. Solutions for 3D pose estimation involve various learning approaches, such as Support Vector Machines and Gaussian processes, but many encounter difficulties in cluttered scenarios and require additio...
متن کاملAn Augmented Reality Based User Interface Model for Full Body Interaction using Monocular 3D Pose Estimation
In this paper, we present a conceptual design of augmented reality full body interaction based on monocular 3D pose estimation. The proposed design is based on 3D pose estimation from the image of user’s motions captured by a monocular camera and the processing of 3D human poses for augmented reality applications. Based on the method, a 3D human full body model is constructed. The silhouettes e...
متن کاملMotion capture and human pose reconstruction from a single-view video sequence
a r t i c l e i n f o a b s t r a c t We propose a framework to reconstruct the 3D pose of a human for animation from a sequence of single-view video frames. The framework for pose construction starts with background estimation and the performer's silhouette is extracted using image subtraction for each frame. Then the body silhouettes are automatically labeled using a model-based approach. Fin...
متن کاملA Dual-Source Approach for 3D Human Pose Estimation from a Single Image
In this work we address the challenging problem of 3D human pose estimation from single images. Recent approaches learn deep neural networks to regress 3D pose directly from images. One major challenge for such methods, however, is the collection of training data. Specifically, collecting large amounts of training data containing unconstrained images annotated with accurate 3D poses is infeasib...
متن کاملVideo Subject Inpainting: A Posture-Based Method
Despite recent advances in video inpainting techniques, reconstructing large missing regions of a moving subject while its scale changes remains an elusive goal. In this paper, we have introduced a scale-change invariant method for large missing regions to tackle this problem. Using this framework, first the moving foreground is separated from the background and its scale is equalized. Then, a ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004